rl algorithm
Simple random search of static linear policies is competitive for reinforcement learning
Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics. We introduce a model-free random search algorithm for training static, linear policies for continuous control problems. Common evaluation methodology shows that our method matches state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks. Nonetheless, more rigorous evaluation reveals that the assessment of performance on these benchmarks is optimistic. We evaluate the performance of our method over hundreds of random seeds and many different hyperparameter configurations for each benchmark task. This extensive evaluation is possible because of the small computational footprint of our method. Our simulations highlight a high variability in performance in these benchmark tasks, indicating that commonly used estimations of sample efficiency do not adequately evaluate the performance of RL algorithms. Our results stress the need for new baselines, benchmarks and evaluation methodology for RL algorithms.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Oregon (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Asia > China > Shandong Province > Dongying (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Texas (0.04)
- Europe > Norway > Western Norway > Vestland > Bergen (0.04)
- (2 more...)
- Leisure & Entertainment > Games (1.00)
- Transportation > Ground > Rail (0.45)
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs Kevin Tan, Wei Fan, Y uting Wei Department of Statistics and Data Science The Wharton School, University of Pennsylvania
Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022b) is whether hybrid RL can improve upon the existing lower bounds established for purely of-fline or online RL without requiring that the behavior policy visit every state and action the optimal policy does. While Li et al. (2023b) provided an affirmative answer for tabular P AC RL, the question remains unsettled for both the regret-minimizing and non-tabular cases. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both P AC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for P AC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.
- North America > United States > Pennsylvania (0.40)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.05)
- Europe > Finland (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > France (0.04)
- Europe > Austria (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)